Skip to content

Conversation

@chaokunyang
Copy link
Collaborator

@chaokunyang chaokunyang commented Jan 7, 2026

Why?

Java doesn't have native unsigned integer types, but many other languages (Rust, Go, C++, Python with ctypes) do. When serializing data across languages, we need to properly handle unsigned integers to ensure correct values and efficient encoding.

For example:

  • A Rust u32 with value 3_000_000_000 cannot be directly represented in Java's signed int (max ~2.1 billion)
  • Variable-length encoding for unsigned integers can skip zigzag encoding overhead
  • Cross-language compatibility requires proper unsigned type support in the protocol

What does this PR do?

1. Adds Unsigned Integer Type Support (All Languages)

  • New type IDs: UINT8 (9), UINT16 (10), UINT32 (11), VAR_UINT32 (12), UINT64 (13), VAR_UINT64 (14), TAGGED_UINT64 (15)
  • Unsigned types use the same bit width as signed types but interpret values in the unsigned range

2. Renames Type Constants for Clarity (All Languages)

  • VAR32VARINT32
  • VAR64VARINT64
  • H64TAGGED_INT64
  • VARU32VAR_UINT32
  • VARU64VAR_UINT64
  • HU64TAGGED_UINT64

3. Java: Adds Type Annotations for Field-Level Control

New annotations allow specifying exact encoding at field level:

  • @Uint8Type - Mark field as unsigned 8-bit [0, 255]
  • @Uint16Type - Mark field as unsigned 16-bit [0, 65535]
  • @Uint32Type(compress=true/false) - Unsigned 32-bit with optional varint encoding
  • @Uint64Type(encoding=FIXED/VARINT/TAGGED) - Unsigned 64-bit with encoding options
  • @Int32Type(compress=true/false) - Signed 32-bit with optional varint encoding
  • @Int64Type(encoding=FIXED/VARINT/TAGGED) - Signed 64-bit with encoding options

4. C++: Adds FORY_FIELD_CONFIG Macro for Field Encoding Control

struct MyStruct {
  uint32_t fixed_count;
  uint64_t var_id;
  uint64_t tagged_ts;
  FORY_FIELDS_INFO(MyStruct, fixed_count, var_id, tagged_ts);
};

// Configure encoding per field
FORY_FIELD_CONFIG(MyStruct,
  (fixed_count, Encoding::Fixed),    // Use fixed 4-byte UINT32
  (var_id, Encoding::Varint),        // Use VAR_UINT64
  (tagged_ts, Encoding::Tagged)      // Use TAGGED_UINT64
);

5. Rust: Extends #[fory(...)] Derive Macro with Encoding Attributes

#[derive(Fory)]
struct MyStruct {
    #[fory(compress = false)]           // Use fixed INT32 instead of VARINT32
    fixed_id: i32,
    
    #[fory(encoding = "tagged")]        // Use TAGGED_INT64
    tagged_ts: i64,
    
    #[fory(encoding = "varint")]        // Use VAR_UINT64 (default for u64)
    var_count: u64,
    
    #[fory(encoding = "fixed")]         // Use fixed UINT64
    fixed_count: u64,
}

6. Go: Extends Struct Tags with compress and encoding Options

type MyStruct struct {
    FixedI32   int32   `fory:"compress=false"`        // Use fixed INT32
    VarI32     int32   `fory:"encoding=varint"`       // Use VARINT32 (default)
    FixedU32   uint32  `fory:"encoding=fixed"`        // Use fixed UINT32
    TaggedI64  int64   `fory:"encoding=tagged"`       // Use TAGGED_INT64
    VarU64     uint64  `fory:"encoding=varint"`       // Use VAR_UINT64 (default)
    FixedU64   uint64  `fory:"encoding=fixed"`        // Use fixed UINT64
}

Options:

  • compress=true/false: For int32/uint32, controls varint vs fixed encoding
  • encoding=varint/fixed/tagged: For all numeric types, explicitly sets encoding
    • int32/uint32: "varint" (default) or "fixed"
    • int64/uint64: "varint" (default), "fixed", or "tagged"

7. Python: Adds Type Hints for Encoding Control

from pyfory.types import (
    int32, fixed_int32,           # VARINT32 vs INT32
    int64, fixed_int64, tagged_int64,  # VARINT64 vs INT64 vs TAGGED_INT64
    uint32, fixed_uint32,         # VAR_UINT32 vs UINT32
    uint64, fixed_uint64, tagged_uint64,  # VAR_UINT64 vs UINT64 vs TAGGED_UINT64
)

@dataclass
class MyStruct:
    var_id: int32            # Uses VARINT32
    fixed_id: fixed_int32    # Uses fixed INT32
    tagged_ts: tagged_int64  # Uses TAGGED_INT64
    var_count: uint64        # Uses VAR_UINT64
    fixed_count: fixed_uint64  # Uses fixed UINT64

8. Java Internal Changes

  • DispatchId System: New DispatchId class handles type dispatching in code generation
  • ObjectCodecBuilder: Handles boxed dispatch IDs for non-nullable boxed fields
  • Type ID Unification: Java native mode now shares type IDs (BOOL~STRING) with xlang mode

Related issues

Closes #3110
Closes #2914
#3099
#1017
#2906
#2982

Does this PR introduce any user-facing change?

  • Does this PR introduce any public API change?

    • Java: New annotations @Uint8Type, @Uint16Type, @Uint32Type, @Uint64Type, @Int32Type, @Int64Type
    • C++: New FORY_FIELD_CONFIG macro for encoding configuration
    • Rust: New compress and encoding attributes in #[fory(...)] derive macro
    • Go: New compress and encoding options in struct tags
    • Python: New type hints (fixed_int32, tagged_int64, uint32, etc.)
    • All: Renamed type constants (e.g., VAR32VARINT32)
  • Does this PR introduce any binary protocol compatibility change?

    • Adds new type IDs for unsigned integers (9-15)
    • Existing signed integer encoding remains compatible

Benchmark

N/A - This PR focuses on correctness and cross-language compatibility. Performance characteristics of unsigned types are similar to their signed counterparts.

@chaokunyang chaokunyang changed the title feat(java/xlang): support unsigned types for java feat(java/xlang): support unsigned types for java/python/xlang Jan 7, 2026
@chaokunyang chaokunyang force-pushed the support_unsigned_types_for_java branch from 1c1ae07 to 43b7783 Compare January 7, 2026 10:10
@chaokunyang chaokunyang force-pushed the support_unsigned_types_for_java branch from 85d3b9b to af664fd Compare January 9, 2026 11:00
@chaokunyang chaokunyang changed the title feat(java/xlang): support unsigned types for java/python/xlang feat(xlang): support serialization for unsigned types and field encoding config Jan 10, 2026
@chaokunyang chaokunyang mentioned this pull request Jan 10, 2026
17 tasks
- Add missing Apache license header to DispatchId.java
- Fix ClassCastException in DefaultValueUtils.setDefaultValues by using
  Number interface for type conversion instead of direct casts
…rackingRef is false

When global ref tracking is enabled, serializers call reference() at the end
of deserialization. If a field has trackingRef=false (e.g., in xlang mode where
all fields default to trackingRef=false), we need to push a stub -1 via
preserveRefId() so that reference() can pop it and skip setReadObject.

The fix checks if the TYPE normally needs ref tracking (ignoring field-level
metadata) by using TypeRef.of(typeRef.getRawType()). This ensures the stub is
pushed when needed, preventing ArrayIndexOutOfBoundsException when the serializer
calls reference() on an empty readRefIds stack.
Use Types.getTypeId() instead of ClassResolver registered IDs for
determining dispatch IDs in DefaultValueUtils. This ensures consistent
type IDs between DispatchId constants and the values used in setDefaultValues.

Also convert default values to correct types during initialization to
avoid repeated type conversion at runtime.
@chaokunyang chaokunyang merged commit 724fece into apache:main Jan 10, 2026
59 checks passed
chaokunyang added a commit that referenced this pull request Jan 11, 2026
## Why?



## What does this PR do?

Fix performance regression introduced in #3113

## Related issues

#3113

## Does this PR introduce any user-facing change?



- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?

## Benchmark
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Xlang] Add unsigned int8/16/32/64 to xlang serialization spec

2 participants